Selection and replacement algorithms for memory performance improvement in Spark

نویسندگان

  • Mingxing Duan
  • Kenli Li
  • Zhuo Tang
  • Guoqing Xiao
  • Keqin Li
چکیده

As a parallel computation framework, Spark can cache repeatedly resilient distribution datasets (RDDs) partitions in different nodes to speed up the process of computation. However, Spark does not have a good mechanism to select reasonable RDDs to cache their partitions in limited memory. In this paper, we propose a novel selection algorithm, by which Spark can automatically select the RDDs to cache their partitions in memory according to the number of use for RDDs. Our selection algorithm speeds up iterative computations. Nevertheless, when many new RDDs are chosen to cache their partitions in memory while limited memory has been full of them, the system will adopt the least recently used (LRU) replacement algorithm. However, the LRU algorithm only considers whether the RDDs partitions are recently used while ignoring other factors such as the computation cost and so on. We also put forward a novel replacement algorithm called weight replacement (WR) algorithm, which takes comprehensive consideration of the partitions computation cost, the number of use for partitions, and the sizes of the partitions. Experiment results show that with our selection algorithm, Spark calculates faster than without the algorithm, and we find that Spark with WR algorithm shows better performance. Copyright © 2015 John Wiley & Sons, Ltd.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improvement in WRP Block Replacement Policy with Reviewing and Solving its Problems

One of the most important items for better file system performance is efficient buffering of disk blocks in main memory. Efficient buffering helps to reduce the widespeed gap between main memory and hard disks. In this buffering system, the block replacement policy is one of the most important design decisions that determines which disk block should be replaced when the buffer is full. To o...

متن کامل

An Improvement in WRP Block Replacement Policy with Reviewing and Solving its Problems

One of the most important items for better file system performance is efficient buffering of disk blocks in main memory. Efficient buffering helps to reduce the widespeed gap between main memory and hard disks. In this buffering system, the block replacement policy is one of the most important design decisions that determines which disk block should be replaced when the buffer is full. To o...

متن کامل

The Effects of Spark Training on Visual-Spatial Working Memory Operation in Children with Mental Retardation

Introduction: Mental retarded children who receive a wide range of health services, representing more than two percent of the population. Mental retardation is associated with significant constraints on mental performance and adaptive behavior as well as perceptual and practical skills. According to the studies, one of the important tools that can affect cognitive abilities, such as memory, is ...

متن کامل

A Modified Grey Wolf Optimizer by Individual Best Memory and Penalty Factor for Sonar and Radar Dataset Classification

Meta-heuristic Algorithms (MA) are widely accepted as excellent ways to solve a variety of optimization problems in recent decades. Grey Wolf Optimization (GWO) is a novel Meta-heuristic Algorithm (MA) that has been generated a great deal of research interest due to its advantages such as simple implementation and powerful exploitation. This study proposes a novel GWO-based MA and two extra fea...

متن کامل

A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability

Sustainability research faces many challenges as respective environmental, urban and regional contexts are experiencing rapid changes at an unprecedented spatial granularity level, which involves growing massive data and the need for spatial relationship detection at a faster pace. Spatial join is a fundamental method for making data more informative with respect to spatial relations. The drama...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2016